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knowledge or skills that students should. have mastered. They are based on 
grade-level curriculum guidelines. Norm- referenced testing is quite different 
in concept. Test developers administer a pilot test to a representative group 
of students, compute average scores, and then compare student achievement on 
the tests to how well the average student performed on the pilot test. 
Performance based tests are usually paper and pencil tests that are not 
multiple choice. Such tests are used to assess writing skills, computer 
skills, or skills in a field such as the performing arts. Using test results 
and other assessments, the comparison of schools and school districts can be 
problematic. For this reason, the idea of "value added" or gain score 
measurement has been adopted by some tests to measure a student against his 
or her own previous performance . This overview of testing and test concepts 
may help parents, teachers, and community members in discussions of the 
various tests administered in a local school district. (Contains 15 
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Measuring student achievement 



There is an almost universal acknowledgment that schools do make a difference. To 
believe otherwise is to take so pessimistic a view of the educational process as to be almost 
imconscionable. We have progressed beyond the Elizabethan belief that a person does not (and 
should not) rise above the station or role in life to which he or she was bom. That said, there 
remains the task of communicating to students, parents, and the general public sufficient 
information about student learning to aid them in making decisions about how they and their 
schools are doing. There are many ways that both parents and the public look at a school's effect 
on student achievement: the win-loss record of athletic teams, the number of active clubs, 
involved parent-teacher organizations, successful fund-raisers, an award-winning music program, 
number of AP classes, number of students on honor roll (all-A students), discipline, 
promotion-retention, scholarships, and so on (McLean, Snyder, & Lawrence, 1998). All of these 
are quite important to a student's success in school and the school which fails to provide these 
may open itself up to public criticism (and deservedly so) no matter how well its students achieve 
on standardized tests. 

However, those who are charged with the evaluation of program, school, or teacher 
effects often must consider how well students do on the variety of standardized tests which 
students take during the course of the school year. These may be criterion-referenced, 
norm-referenced, or performance-based (i.e., writing and the Uke). Most states have adopted or 
passed accountability legislation which mandates the type of indicators that will be used to 
evaluate or assess student academic achievement. 

Criterion-referenced tests are tests that measure knowledge or skills which students should 
have mastered. In other words, criterion-referenced tests are based on grade level curriculum 
guidelines. They usually have sections on reading, language, math, science, and social studies. The 
multiple-choice unit tests most people remember taking in school were more likely than not 
criterion-referenced tests as they were based on material that was taught for a specific course and 
grade level. This is a simple enough matter for the classroom teacher, who can construct such a 
test based on what was taught in class and on personal knowledge of the students in the classes. 
For those working at the district, state, or national level it is somewhat more problematic. There is 
no universal curriculum although many states and professional organizations have published 
standards and learning objectives for subject areas and grade levels. Criterion-referenced tests are 
based upon the premise that upon completion of the course or prescribed curriculum, all students 
win meet a certain level of mastery of the material. This is usually considered to be between 75% 
to 80% correct. Students should know what the criterion are and should be able to study for the 
test. Thus, wlule some smaU percentage of students are exempt, the assumption is that regardless 
of which school the student attends, his socioeconomic level, ethnicity, ability level, or prior 
preparation, he or she will respond correctly to at least 75% of the questions on the test. The plus 
&ctor of criterion testing is equality in the level of expectations for aU students. Schools which 
teach a large percentage of disadvantaged, at-risk students where many parents may have little or 
no formal education are expected to do as well as those whose students are fi'om highly educated, 
affluent families. The minus is that cut points and standards may then be set low to assure that 
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sufficient numbers of students achieve mastery level. There are a variety of statistical methods 
which may be used to make these decisions (McLean &. Lockwood, 1996; Millman (Ed.), 1997; 
Popham, 1988; Sanders «& Horn, 1995). Criterion-referenced testing may beg the question of 
whether or not some schools have students which are more difficult to teach than others or 
whether some students learn more quickly than others. Even if all students attain mastery it is not 
possible to determine if all students were challenged to do their best, especially if many students 
achieved perfect or near perfect scores. The brightest may have coasted through testing while 
other students may have been severely stressed, the "high stakes test syndrome"(Linn, 2000; 
Madaus, 1993; Wiersma & Jurs, 1990). In criterion-referenced testing the student is measured 
against the test, not against any other student's achievement. If a school which serves a large 
percentage of at-risk students does not have as many students at mastery as a school which serves 
an advantaged student population, does it mean that the school and its teachers are less good than 
the other school, or that the school is providing a poor education for its students? 

Norm-referenced testing is quite different in concept. Generally speaking, test makers 
administer a pilot test to a representative group of students, average scores are computed, and 
student achievement on subsequent tests is compared to how well the average student performed 
on the pilot test. There are a variety of statistical methods used to determine if the test is fair and 
equally difficult for all groups of students (CamiUi & Shepard, 1994; Linn & Hamisch, 1981). 
Thus, most students would be expected to have scores in the average range, and students who 
achieved at the 75% mastery level would have done as well or better than 75% of all students 
who took the test. In this method the student is measured against how well other students 
performed, not against the test itself Students are not usually encouraged to study for 
norm-referenced tests as they are constructed to measure a broad range of general knowledge in 
subject areas such as reading, language arts, math, science, and social studies. Whereas the hope 
and expectation is that all students will achieve at least 75% on a criterion-referenced test and 
many will obtain perfect scores, that is not the case with norm-referenced tests. Few if any 
students are expected to achieve perfect scores on norm-referenced tests. The advantage of the 
norm-referenced test is that it is possible for parents and counselors to have some idea of how 
well a student is achieving compared to other students who took the test and thus aid in making 
career or college decisions. As with the criterion-referenced test, if the student population of a 
school is not representative of the student population on which the test was normed, schools with 
a larger percentage of at-risk students than the norm group may have lower than average scores 
while schools with fewer at-risk students may have higher than average scores. Once again, does 
this mean that the school with lower scores but more at-risk students is less good than the school 
with higher scores but fewer at-risk students? 

Performance-based tests are usually paper and pencil tests, not multiple choice tests such 
as criterion and norm-referenced tests. Performance-based tests are usually used to assess writing 
skills, computer skills, or skills in the performing arts. Writing assessments are generally scored 
according to certain rubrics, such as organization, content, grammar, and the like. Since these 
tests are most often scored by individuals and not by a machine or computer, it is important that 
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assigned scores be consistent, that is that any given paper would receive the same score no matter 
who scored it, with the same weight given by each to organization, content, grammar, and the like 
(Moore & Young, 1997; Reckase, 1997). Many states have mandated performance-based tests as 
part of their state-wide accountability systems but they are expensive to administer and require 
more man hours and training to be scored properly than do computer scored multiple choice tests. 
Performance-based tests are similar to criterion-referenced tests in that the student is usually 
measured against the test criteria, not against how well other students performed. The plus is that 
all students are expected to perform to certain pre-specified standards, that they have an avenue 
to exhibit creativity not provided by the multiple-choice format, and that they can demonstrate as 
high a level of skill on the assigned task as they wish to or are capable of The question for 
evaluation and assessment is whether it is reasonable to expect that students who have parents 
who may not write well (if at all), who have few books, no musical instruments, and no computers 
in the home will perform as well as students who have access not just at school but at home to 
these advantages. Thus, are schools and school districts which serve large percentages of 
disadvantaged students and have lower than average performance scores providing a "less good" 
education than more fortunate schools and school districts, and how can that be determined? 

Since most school districts are not completely homogeneous, comparisons of schools and 
school districts can be quite problematic. Is it possible to ascertain in a fair way whether or not a 
particular school or school system is doing as well as it could with its unique percentages of 
at-risk or other identified sub-groups of students? To that end the concept of "value-added" or 
gain score measurement has been adopted by several states such as Tennessee and school districts 
such as Dallas Public Schools, among others (Millman (Ed.), 1997). Value-added or gain scores 
can be computed for either multiple-choice or for performance-based tests except that portfolio 
assessment is more commonly used for performance-based tests. Rather than being compared to a 
norm group or to a criterion, the student is measured against his or her own prior average 
achievement gain score. It is expected that, with a well-designed curriculum and all other things 
being equal, a student will learn the same amoimt of material at the same rate of speed fi'om one 
year to the next, and that variation in the average amoimt of material learned is due to the 
influence of the school and the teachers. Thus, students who learn at a slower pace than others 
will be compared to their own previous average year's gain and students who absorb information 
more quickly are compared to their own previous average year's gain. This method has been 
rather hotly debated on several grounds: (a) the "ceiling effect" in which students who have 
topped out on a test cannot achieve a gain score because they are already performing at the 
highest possible level; (b) the "floor effect" in which students at the lowest end have nowhere to 
go but up and thus may have misleadingly high gain scores (Slavin, 1992); (c) a within school 
effect that would not become apparent until a cohort of students changed schools and the group 
gain scores decreased or increased according to which school they next attended (Sanders et al., 
1994); and (d) teaching to the test on a norm-referenced test is contrary to the concept of 
norm-referenced testing and results in scores that are not comparable to those of the norm group 
if they did not receive instruction prior to testing (Shepard, 1990). Thus district administrators, 
parents, and community members should be aware that comparisons of schools and school 
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districts are often fraught with pitfalls, particularly when attempting to compare apples with 
oranges. For this reason many systems have a variety of accoimtability mechanisms in place to aid 
decisionmakers. 

While this overview may perhaps be somewhat simplistic, nevertheless parents, teachers, 
and community members who are not professionals in the area of K-12 student testing may find 
this information helpful as a starting point in the discussion of the various state, local, and national 
tests administered by their child's school district. 
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